Introduction

Age-distributed population is provided by UNWPP in two forms: quinquennial, which is 5-yearly resolution in both age and time, and an interpolated population, which is 1-yearly for both age and time. In both cases, the resolution of ages between 80 and 100 changes at the year 1990, as shown below.

Age resolution for UNWPP Population datasets
1950-1989 1990-2100
Quinquennial 0-4, 5-9 … 75-79, 80+ 0-4, 5-9 … 95-99, 100+
Interpolated 0-0, 1-1 … 79-79, 80+ 0-0, 1-1 … 99-99, 100+

UNWPP do have data points internally to produce consistent data including the higher resolution at old age for the full data range, but they do not feel the data quality is sufficient to publish. However, the change in format and representation at 1990 is inconvenient for modelling groups. We have been asked if we can standardise this across the time-series, so that the published UNWPP data from 1990 onwards can be used, supplemented with a reasonable approximation of the over-80s before 1990, which would sum to the published over-80 value.

This report describes the method we have chosen, and shows graphs of the results. Note that in the diagrams describing the algorithm below, it is sometimes difficult to show a “80+” data point, where there also may exist an “80-80” age point. See the legend on each graph to distinguish where necessary.

Note that this algorithm is applied before the small-country work documented elsewhere (Kosovo, Tuvalu and Marshall Islands); the age distributions for those countries will be based on the over-80 extended populations for Serbia, Tonga and Micronesia, which we generate here.

Method

Out of several attempted approaches, we describe here the one giving results that look the most convincing visually, when plotted alongside the surrounding data points. This is the best we can really do, since we don’t have external data with which to compare.

We begin by looking at the UNWPP data for the interpolated population.

We now construct a life-table for 79-99 year olds at the earliest time we can, which begins with people aged 79 in 1989, and continues to record the 99-year olds in 2009. This results in a sequence of 21 numbers, which will normally be decreasing as that cohort grows older. (Any rare increases will be where immigration exceeds mortality for that age group).

We then make the simple assumption that the proportional changes from one age-year to the next in this life-table can be applied to all earlier cohorts. This lets us estimate values for a parallelogram shaped section of the grid, by applying the rates from the life-table in an open-ended way, starting at age 79.

We then fill the right-triangle between two known data-points: age 79 in years 1970-1988, and whichever age that cohort reaches in 1990. We use the appropriate subset of the life-table, with linear scaling to meet the range between the two known points.

Thirdly, we look at the left-triangle, which starts with cohorts aged between 80 and 99 in 1950, and these cohorts will reach the age of 99 between 1950 and 1969. We have neither a start, nor end point to work with here; all we have is the 79, and 80+ points.

We assumed that the age profile from years 79 to 99 is the same in 1950 as it was in 1970 - the earliest year for which we now have a full age profile. We multiply that profile by (80+ in 1950)/(80+ in 1970), to approximate the age profile in 1950.

Then for each cohort in 1950 from age 80 to 98, we use the life-table from earlier to calculate the decrease in cohort size as they age, and as the year increases.

We now consider the 100+ age group. Here, we summed 80..100+ to give an 80+ value for 1990, and then applied the fraction of 100+/80+ from that year, to all previous years, for which we have the 80+ data point. Again this is a simplistic assumption, especially since cohort-age behaviour varies in a periodic way. But the number of people in the 100+ category is quite small, and we really have no other guiding data.

Lastly, we need to normalise the data and ensure that the new estimates of ages 80-100+ sum to give the original UNWPP 80+ point. The algorithm above tended to overestimate population as the life-table used is from a later period than the target we are trying to emulate.

A linear scaling is not sufficient here; it opens up a very obvious gap of the “younger” ages of 80 and 81, since they have the largest populations. If anything, we want to reduce the tail of the population, since this is the area we have artifically increased by using a life-table based on longer-aged populations.

We used an algorithm that reduced the age-80 population by 1%, and ages 81-100+ we then altered by 1+(age-80)x %, where we iterated to find the value of x, such that the total of the 80-100+ age groups matched the original 80+ target.

For some countries, this caused a few small negative numbers in some of the later ages. If this occurred, we set those values to zero, and performed a standard linear scaling to lift the population across the ages to correct the total. The very final results are then rounded to integers while preserving the total. (ie: they are truncated to integers first, and those with the largest remainders are increased in order until the total is met.)

The quinquennial population is then calculated by summing ages and years into bunches of five, and copying the 100+ data. Since the 80+ UNWPP data for 1950 in the quinquennial population data matches 1950 in the interpolated population data, and continues to do so for 1955, 1960, etc., the job of adding the points to the quinquennial population is therefore trivial.

Results

The black dotted lines show the pre-existing data; we show ages 75-79 for all years, and ages 80-99 and 100+ after 1990. (We do not show the original 80+ points). The other rainbow lines are ages 80-99 and 100+ for 1950-1989.